Week 07: Code Generation & Operational Semantics

Code Generation

Introduction

Goal

Generating code for a stack machine with the accumulator. Run the code on a real machine, e.g., the MIPS processor.

Concept

  • The accumulator is kept in MIPS register $a0.
  • Keeps the stack in memory.
  • The address of the next location on the stack is kept in MIPS register $sp
    • The top of the stack is at address $sp + 4

MIPS

We will compile the source code from Cool to MIPS assembly to execute it.

MIPS

  • RISC
  • Operation uses registers for operands and results.
  • Use load and store instructions to manipulate the value in memory.
  • 32 general purpose registers x 32bits per register

MIPS Instruction

  • lw reg_1 offset(reg_2): Load 32-bit word in address reg_2 + offset into register reg_1
  • sw reg_1 offset(reg_2): Store 32-bit word in register reg_1 into address reg_2 + offset
  • add reg_tar reg_src reg_src2
  • addiu reg_tar reg_src imm: reg_tar = reg_src + imm (u means overflow is not checked.)
  • li reg imm: reg = imm
  • beq reg1 reg2 label: If reg1 = reg2, then jump to label.
  • b label: Jump to label.

Sample for 7+5

  • In compiler
acc <- 7
push acc

acc <- 5
acc <- acc + top_of_stack

pop
  • Code gen
li $a0 7
sw $a0 0($sp)
addiu $sp $sp -4
li $a0 5
lw $t1 4($sp)
add $a0 $a0 $t1
addiu $sp $sp 4

Code Generation

How it works

For expression e, generate MIPS code

  • Compute the value of e in $a0
  • Preserve $sp and the contents of the stack

Define a code generation function cgen(e) whose result is the code generated for e.

The code in source

cgen(e1 + e2) =
cgen(e1)
print "sw $a0 0($sp)"
print "addiu $sp $sp -4"
cgen(e2)
print "lw $t1 4($sp)"
print "add $a0 $t1 $a0"
print "addiu $sp $sp 4"

Remarks

  • Before entering the another cgen function, you should not save anything in the temporal register. Because the temporal register may be used after. Just use the stack machine.
  • Stack machine code generation is recursive.
  • Code generation can be written as a recursive-descent of the AST.
  • In different language, code for function calls and function definitions depends on the layout of the AR.

Simple AR for Cool

  • The result is always in the accumulator.
  • The activation record holds actual parameters.
  • The stack discipline guarantees that on function exit $sp is the same as it was on function entry.

On function exit

  • We need the return address to jump back.
  • A pointer to the current activation is useful.
  • This pointer lives in register $fp (frame pointer)
  • When entering a call to f(x,y) the AR of f will be pushed into the stack:
    • old fp
    • y
    • x
    • Return Address

How do we manage the variable's location

  • Problem: Because the stack grows when intermediate results are saved, the variables are not at a fixed offset from $sp
  • Solution
    • Tutor: use a frame pointer
    • I use another stack in code gen to save and search the variables in the stack.
      • load from $sp + (Stack_Size - Variable_Index)
  • The activation record must be designed together with the code generator
  • Code generation can be done by recursive traversal of the AST

Temporaries

How do we estimate the stack's size?

Basic

Let NT(e) = # of temps needed to evaluate the expression e

  • NT(e1 + e2)
    • Needs at least as many temporaries as NT(e1)
    • Needs at least as many temporaries as NT(e2) + 1
    • Space used for temporaries in e1 can be reused for temporaries in e2
    • NT(e1 + e2) = max(NT(e1), 1 + NT(e2))
  • NT(e1 - e2) = max(NT(e1), 1 + NT(e2))
  • NT(if e1 = e2 then e3 else e4) = max(NT(e1),1 + NT(e2), NT(e3), NT(e4))
  • NT(id(e1,...,en) = max(NT(e1),...,NT(en))
  • NT(int) = 0
  • NT(id) = 0

Function

For a function definition f(x1,...,xn) = e the AR has 2 + n + NT(e) elements

  • Return Address $ra
  • Frame Pointer $fp
  • n arguments
  • NT(e) locations for intermediate results

Tempories in AR

  • old fp
  • xn
  • ...
  • x1
  • RA
  • Temp NT(e)
  • ...
  • Temp 1

Remarks

  • Code generation must know how many temporaries are in use at each point
  • Add a new argument to code generation
  • the position of the next available temporary
  • The temporary area is used like a small, fixed-size stack

Object Layout

Basic

  • OO implementation = Basic code generation + More stuff
  • If B inherits A, code in class A works unmodified for an object of class B
  • Each attribute stored at a fixed offset in the object
  • The attribute is in the same place in every object of that class and inherited classes.

Object Layout

  • Offset +0: Class Tag (Int)
  • Offset +4: Object Size (Int)
  • Offset +8: Pointer to Class Method Table (Address)
  • Offset +12, 16, ...: Pointer to Attribute (Address)
    • If C inherits B and B inherits A, then the sequence is A's attributes, B's attributes, and C's attributes.

Class Method Table Layout

  • If C inherits B and B inherits A
  • A contains f1, f2, f3
  • B contains f2, f3, f4, f5
  • C contains f3, f5, f6
  • Class C Method Table Layout
    • A's method f1: A.f1
    • A's method f2 overrided by B: B.f2
    • A's method f3 overrided by C: C.f3
    • B's method f4: B.f4
    • B's method f5 overrided by C: C.f3
    • C's method f6: C.f6

Cgen Semantics

Semantics Overview

Rules in every step

  • The tokens $\Rightarrow$ lexical analysis
  • The grammar $\Rightarrow$ syntactic analysis
  • The typing rules $\Rightarrow$ semantic analysis
  • The evaluation rules$\Rightarrow$ code generation and optimization
    • The compilation of Cool to a stack machine.

Consideration in Assembly-language description

  • Whether to use a stack machine or not
  • Which way the stack grows
  • How integers are represented
  • The particular instruction set of the architecture

How to specify evaluation rules?

  • Many ways to specify semantics
  • Operational Semantics (What we use in Cool)
    • Describes program evaluation via execution rules
      • on an abstract machine
    • Most useful for specifying implementations
  • Denotational Semantics
    • Program’s meaning is a mathematical function
  • Axiomatic Semantics
    • Program behavior described via logical formulae
      • If execution begins in state satisfying X, then it ends in state satisfying Y
      • X, Y formulas
    • Foundation of many program verification systems

Operational Semantics

TL; DR

Similar to typing rules. Read Coool Manual Ch.13 Operational semantics.


In [ ]: